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SUBSTITUTE SPECIFICATION 

Locally Distributed Speech Recognition System And 

Method Of Its Operation 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] The invention relates generally to a distributed speech recognition 

system. It also relates generally to a speech recognition system for the use in a 
cellular phone network. In particular the present invention relates to speech 
recognition system for the input of short messages. In further detail the present 
invention is related to a speech recognition system in a cellular phone network for 
transmitting short speech messages without the use of speech transmission 
channels. 

Description of the Prior Art 
[0002] The spread of cellular phones and the large scale integration of 

electronic devices in the recent years have led to a wide spread use of a telematic 
service called short message service (SMS). This service is used to transfer short 
messages from one cellular phone to another. It is also possible to transfer a short 
message to an e-mail address. Short messages (SM) presently used in the Global 
System for Mobile communication (GSM) cellular phone network comprise a 
maximum quantity of 160 characters. By chaining up several short messages even 
longer texts can be transferred via a SMS. 
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[0003] The standard procedure to input SM in a GSM-phone is to use the 

keyboard. The use of a standard GSM-phone keyboard is time consuming and 
requires the total attention of the user. Even the use of an input routine, such as the 
T9-logic, does not obviate these drawbacks. In case the SM is spoken, the input time 
and the user's attention could be considerably reduced. 

[0004] Currently used speech recognition systems are not operable in cellular 

phones, due to insufficient processing power, battery capacity, etc. 
[0005] Standard speech recognition systems capable of converting 

spontaneous speech into written text and known as "Large Vocabulary Continuous 
Speech Recognition (LVCSR) systems" require huge storage capacity and complex 
computing devices. Such systems can not be integrated in a single cellular phone. 
[0006] Conventional speech recognition systems are developed to attain a 

reliable conversion of spontaneous speech into written text. One approach is to 
increase the accuracy of the single operations in a speech recognition system. 
Conventional speech recognition systems consist of a subdevice for phoneme 
recognition, and a subdevice for word recognition, which devices are closely 
connected. A phoneme is one of a group of distinctive sounds that make up a word of 
a language. It is supposed that a phoneme recognition system is capable of 
recognizing intervals, too. The major approach is to reach complete accuracy in both 
the phoneme recognition and the word recognition process. 

[0007] Conventional phoneme recognition systems use adaptive interactive 

neuronal networks, that have to be trained for an accurate recognition of phonemes. 
Other phoneme recognition systems use modular time delay neuronal networks. 
While these systems have been considerably improved recently , the accuracy is 
limited to 80 percent consistency. A background reference is "Speaker-Independent 
Phoneme Recognition Using Large Scale Neuronal Networks" by Nakamura, S .; 
Sawai, H.; Sugiyama, M. Acoustic, Speech, and Signal Processing, 1992", ICASSP- 
92.; in 1992 IEEE International Conference , Volume: 1 , 1992 , Pages 409-412, vol.1 
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[0008] Most efforts to increase the accuracy employ a tight feedback between 

the phoneme and the word recognition system. That includes that the phoneme 
recognition and the word recognition may be integrated in a single system. These 
efforts imply that the complexity of the speech recognition device heavily increases, 
while the accuracy does not increase correspondingly. 

[0009] It may be possible to transmit a speech signal from a cellular phone via 

a speech channel directly to a centralized speech recognition system. Such a 
centralized conventional speech recognition system can not be used, however, in a 
GSM cellular phone network due to the transfer procedure of coding, transmitting and 
decoding, wherein important characteristics of the speech signal get lost. 
Additionally, the bandwidth of the speech transmission channels is limited. The 
bandwidth of the transmission channels is formed by a band pass filtering effect. High 
and low frequencies of speech are not transmitted via the transmission channels. The 
speech recognition system however requires a supply of these frequencies. The loss 
of important characteristics and the restricted bandwidth of the transmission leads to 
an unacceptable loss in speech recognition accuracy, so this procedure of converting 
a speech signal into readable text is not useful. 

[0010] Hence, a speech recognition system having a good accuracy can not 

be integrated in a cellular phone, due to its complexity, space demand and battery 
load- - 

[0011] One approach in order to solve the problem of a cellular phone based 

speech recognition system is recited in WO 00/22610. This document describes in 
particular the disadvantages of a speech recognition system integrated in a cellular 
phone. It also describes the drawbacks of a speech recognition system due to the 
bandwidth of the GSM. It further describes a method of feature extracted parameter 
compression for the transfer of speech to a speech recognition system. The 
described apparatus and method use a speech channel for the transmission of 
feature extracted parameters of the speech waveform. The feature extracted 
parameters are transferred to a speech recognition system. The speech recognition 
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comprises a phoneme and a word recognition system. The prevailing drawback of 
this system is the requirement of a whole speech channel for the transmission 
between the mobile communication device and the interpreting component, the need 
for a new transmission protocol and the requirement for continuous power amplifier 
operation. 

[0012] The problem underlying the invention is to find a method and an 

apparatus for a speech recognition system adapted for the speech input of short 
messages into a cellular or mobile phone communication network. 
[0013] Further, it is desired to simplify the system and to increase the speed of 

the input process. 

SUMMARY OF THE INVENTION 
[0014] The invention solves the problem by a locally distributed speech 
recognition system. 

[0015] According to another aspect of the invention the problem is solved by 

an interpreting component. 

[0016] According to yet another aspect of the invention the problem is solved 

by a mobile communication device. 

[0017] Methods for operating the above devices are also provided. 

[0018] Speech recognition according to the invention is split into a preliminary 

recognition component integrated in a mobile communication device, a transmission 
facility and a remote interpreting component. The transmission facility connects the 
mobile communication device to the interpreting component and vice versa. 
[0019] The transmission facility can be a cellular phone network, a Global 

System for Mobile Communication (GSM) network, a Universal Mobile 
Telecommunication System (UMTS) network, the internet, the World Wide Web, or 
other wide area networks. It could also be a local area network as an intranet, or a 
short distance transmission system between a computer and a peripheral device, e.g. 
a Bluetooth™ system. The mobile communication device can be a cellular phone 
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with a short message feature as well as a mobile computer with a connection to a 
network. The transferring code could be a text format such as ASCII or the code used 
in the Short Message System of GSM networks, or any other text code. 
[0020] In a preferred embodiment of the invention the mobile communication 

device comprises a digital signal processing component being connected to the 
preliminary recognition component. By using the preliminary recognition component 
in a mobile communication device, the preliminary recognition process can be 
supported by a digital speech waveform processing component. Especially in cellular 
phones a digital signal processing component (DSP) can be included in the 
transceiver of the cellular phone. In addition the preliminary code can be compressed 
to reduce its length. 

[0021] The locally distributed speech recognition system provides a 

component for the re-transmission of the digitized readable text back to the user, 
wherein the re-transmission component is connected to the interpreting component. 
Thereby it is possible that the user checks and approves or rejects any insufficiently 
recognized text. 

[0022] Preferably the preliminary recognition system comprises a neural or 

neuronal network or a time delay neuronal network. By using a neuronal network or a 
time delay neuronal network in the preliminary recognition system, the best suited 
computing structure is chosen to solve the problem of speech recognition as 
effectively as possible. The preliminary recognition component preferably comprises 
phoneme recognition component for generating phonemes out of spoken language. 
[0023] Advantageously, the neuronal network is interactively adaptive and/or 
comprises a modular structure. By using an adaptive interactive neuronal network, 
the user can adapt the user's personal mobile communication device to the user's 
personal pronunciation. Thus, the accuracy of the preliminary recognition can be 
improved. By using a modular neuronal network the best accuracy in preliminary 
recognition is attained. 
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[0024] Conveniently the mobile communication device, the preliminary 

recognition system and/or the interpreting component comprise a conversion 
component for converting between different codes, e.g. ASCII, SMS, etc. By using a 
conversion component, any transmission problems due to transfer protocols or 
differing codes in information exchange can be solved. 

[0025] Preferably, the preliminary recognition component, the mobile 

communication device and/or the interpreting component comprise a storage 
component. By using a storage component, the locally distributed speech recognition 
system is able to transfer the recognized phonemes during speech intervals. This 
reduces the operation time of the transmitter of the mobile communication device to a 
minimum. Using a buffer between the speaker and the preliminary recognition 
component enables the system to continuously recognize phonemes, and to transfer 
and receive the code during speech intervals. 

[0026] Advantageously the code transfer between the mobile communication 

device and the interpreting component is achieved by a teleservice. Conveniently the 
used teleservice is a short message system. 

[0027] By using a teleservice, the locally distributed speech system can be 

used by a cellular phone service provider for an easier and faster way of generating 
short messages. The providers of cellular phone networks benefit from an increased 
amount of short messages. The teleservice can be a facsimile, short message 
system (SMS), General Packet Radio Service, or any other not yet introduced 
teleservice capable of transferring text. 

[0028] Preferably, the interpreting component is directly connected to or 

included in a network. It can be connected to an SMS central station. 
[0029] By connecting the interpreting component with a network, a plurality of 

mobile communication devices can use a single interpretation device. This enables 
the installation of a central speech recognition system in cellular phone networks, to 
comply with the requirement of low costs for the single user connected to the central 
speech recognition system. 



[0030] In an alternative embodiment the interpreting component is remote in 

the network. By using a remote interpreting component the provider of a network 
benefits from the fact that even in a case of a failure or a breakdown of a single 
interpreting component the speech recognition system maintains operation. 
[0031] Conveniently the interpreting component comprises a word recognition 

component. 

[0032] Preferably the interpreting component comprises a grammar 

recognition component. 

[0033] Advantageously the interpreting component comprises a syntax 

recognition component. By using word, grammar, and syntax recognition systems, 
which are preferably connected to each other, the interpreting component can 
generate possible interpretations from defective preliminary codes. For generating 
short messages with less than 160 characters this can be a powerful component for 
the speech recognition. Due to the brevity of the message, the words, grammar and 
syntax which are used are less complex than in ordinary speech and the preceding 
preliminary recognition proves satisfactory in association with such interpreting 
component. 

[0034] Advantageously the component for the transfer of data is designed to 
transfer the data in accordance to a transfer protocol, especially that of the short 
message system. 

[0035] By using the short message system transfer protocol, the system can 

be used in existing GSM cellular phone networks. The main advantage is that the 
system can be used world wide, because the GSM standard is used world wide. 
[0036] Preferably the interpreting component uses a discrete hidden markov 

model for interpreting the received coded phonemes. By using a discrete hidden 
markov model, a suitable word recognition system is used for the word recognition. 
[0037] According to an other aspect of the invention, the speech recognition is 

achieved by an interpreting component for use in a locally distributed speech 
recognition system comprising an input for receiving digitally coded phonemes from a 
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remote preliminary recognition component, an output for digital coded readable text, 
and databases for orthography, grammar and syntax. 

[0038] According to an other aspect of the invention, the speech recognition is 

achieved by a mobile communication device for the use in the locally distributed 
speech recognition system comprising an acoustic coupler for transferring an 
acoustic voice waveform into an electronic waveform, a preliminary recognizing 
component for extracting phonemes contained in this waveform, a converting 
component for converting the extracted phonemes into code and a transmitting 
component for transmitting the code. 

[0039] A preferred embodiment of a mobile communication device according 

to the invention further comprises a component to receive data transferred from the 
interpreting component. This enables the user to verify the recognized text for 
accuracy. 

[0040] According to an other aspect of the invention a method for operating a 

locally distributed speech recognition system for the use with a transmission facility 
comprises the operations of: 

- Recognizing the phonemes and intervals of the speech; 

- Converting the phonemes and intervals into code; 

- Transferring the code to a remote interpreting component; 

- Interpreting the code to generate digitized readable text; 

- Transferring the digitized readable text back to the user; 

- Checking the digitized readable text by the user; 

- Accepting or rejecting said text by the user; and 

- Dispatching an acceptance/rejection signal to the interpreting component. 
[0041] After recognizing the phonemes and intervals in the mobile 
communication device, the phonemes are converted into code. The code is 
transferred via a transmission facility to a remote interpreting component. The 
transmission facility can be a communication network such as the internet or cellular 
phone networks. The interpreting component generates readable text from the code. 
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[0042] Preferably, the method further comprises one of the following 

operations of: 

- Supporting the recognizing process by digitally processing the waveform of the 
speech input; 

- Storing the code; 

- Limiting the number of recognized phonemes to a predetermined amount; and 

- Generating a short message containing the phonemes. 

[0043] By supporting the preliminary recognition process with a digital signal 

processor, the accuracy of the recognition process may be improved. Digital signal 
processors are included in transceivers of conventional mobile communication 
devices used in GSM cellular phone networks. During the preliminary recognition 
process, the mobile communication device has to be idle, to prevent self interfering. 
Hence the transceiver of the mobile communication device is in an idle mode during 
the preliminary recognition process. Therefore the digital signal processor can be 
used to process the speech waveform during preliminary recognition. A short time 
delay component upstream of the preliminary recognition component can detect 
speech intervals that can be used to transfer the code via short message system to 
the interpreting device. By counting the phonemes in the mobile communication 
device, the system can communicate to the user that the length of a short message 
was exceeded. By limiting the number of recognized characters, the user can select 
whether his short message should be sent in one, or several short message packets 
to the recipient. The code has to be stored for continuous preliminary recognition and 
simultaneous transmission to the interpreting component. Generating a short 
message from the code enables the mobile communication device to use a non- 
speech channel for the transmission to the interpreting component. The short 
message can contain a code sequence identifying the subsequent characters as 
phonemes. 

[0044] Preferably the method further comprises at least one of the following 

operations of: 
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- Receiving an acceptance/rejection signal by the interpreting component; 

- Re-Interpreting the code to generate a different digitised readable text; 

- Post-Processing of an accepted digitized readable text by the user; 

- Storing said post processed digitized readable text; 

- Dispatching said digitized readable text or said post-processed digitized 
readable text by the user; 

- Transferring a command from the user to the interpreting component for 
dispatching an accepted digitised readable text to a recipient; 

- Dispatching an accepted digitised readable text to a recipient; 

- Receiving and storing information related to the origin of the code for 
improving the interpreting process; 

- Receiving and storing the accepted and/or post-processed digitized readable 
text for updating the databases; and 

- Processing of stored data for improving the accuracy of the interpreting 
process. 

[0045] By transferring the digitized readable text back to the user, the user can 

check whether the recognized text is in accordance with the spoken text. If the 
readable'text diverges too much from the spoken text, the user can send a rejection 
signal to the interpreting component. The rejection signal causes the interpreting 
component to restart interpretation and to generate a differing readable text from the 
code. This procedure is repeated until a readable text is accepted. This text can be 
sent to a recipient. It may be sufficient to transfer a dispatching command to the 
interpreting component. If the readable text diverges slightly from the spoken text, the 
user may accept the text, post-process the text and send it to a recipient. 
[0046] By transferring a post-processed short message back to the interpreting 

component, the interpretation accuracy may be improved significantly. Especially the 
recognition of names and nicknames can be improved, if the interpreting component 
uses this information related to the original phoneme code. The system may be 
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capable to recognize all names by the help of information relating to the origin and 
the address of the short message. 

[0047] According to another aspect of the invention a method is provided for 

operating an interpreting component for the use with a transmission facility and a 
remote mobile communication device, comprising the operations of : 

- Receiving code containing phonemes from the mobile communication device; 

- Interpreting the code to generate digitised readable text in accordance with 
predetermined rules; 

- Dispatching said digitized text to said mobile communication device; 

- Approving or rejecting the digitized readable text by the user; and 

- Receiving an approval/rejection message from said mobile communication 
device. 

[0048] Preferably, the method further comprises at least one of the following 

operations of: 

- Storing the code; 

Storing the digitized readable text; 

- Transferring the digitized readable text to the recipient; 

- Storing the information related to the origin of the code; 

- Receiving and storing the rejected, accepted and/or post processed digitized 
readable text; and 

- Processing of the stored data to improve the interpretation process. 

[0049] Advantageously the interpretation of the code is supplemented in 

accordance with orthography, grammar, and/or syntax. 

[0050] By using orthography, grammar and syntax databases, the interpreting 

component may be capable to interpret garble code. The accuracy of the 
interpretation process may be improved. It may be necessary to use a special 
orthography, grammar and syntax, due to the shortness of the messages. 
[0051] Preferably, the interpretation of the code is executed in accordance with 

orthography, grammar and syntax of the of a specific language selected by the user. 
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[0052] By using orthography, grammar and syntax of a specific language, 

selected by the user, the system can be used by tourists, to generate short 
messages. Especially for the use of the system in multilingual countries, like 
Switzerland, a language selection can be related to the subscriber identification 
module (SIM) of the mobile communication device. 

[0053] Preferably the preliminary recognition component distinguishes vowels, 

consonants, intervals and probabilities. 

[0054] By using not only the phonemes as an input, but also intervals, the 

accuracy of the recognition process may be improved. Further improvement may be 
reached, if the accuracy of the recognition of each phoneme is quantified as a 
probability and transmitted to the interpreting component, too. Probabilities may vary 
from zero which is "not recognized" to 1 .0 which is "surly recognized". In the case that 
instead of one phoneme, a multitude of phonemes with differing probabilities are 
recognized, only the most probable phoneme will be transferred to the interpreting 
component. Alternatively, with sufficient data transfer capacities, an algorithm can be 
used to determine if different phonemes together with their probabilities are 
transferred to the interpreting component. 

[0055] For example, if two differing phonemes PH1, with the probability 0,6, 

and PH2, with the probability 0,9, are recognized, the algorithm only t ransfers t he 
phoneme PH2. If the preliminary recognition system detects, however, a probability 
of 0,7 for PH1 and a probability of 0,6 for PH2, it is useful that the algorithm causes 
both phonemes together with their probabilities to be transferred to the interpreting 
component. So if the interpreting component can not form a readable text using PH1, 
it will automatically be replaced by PH2. The algorithm and this kind of transfer 
procedure economizes a closed feedback loop between the preliminary recognition 
component and the interpreting component. 

[0056] - Preferably the phoneme code is compressed prior to transmittal to the 
interpreting component. 
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[0057] By compressing the code prior to transmittal, the number of transmitted 

short messages may be reduced, to prevent the provider or the network from being 
overloaded. This may be carried out by a system which marks a single phoneme and 
transfers it together with a position code. So instead of transferring the same 
phoneme several times, the system transfers the phoneme once followed by a 
position code. For example the phoneme "PH" is transferred as "PH, 
phonemeposition 3,6,8" instead of "..PH. .PH. PH.." in the short message. Any other 
compression procedure suitable for short messages can be used. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0058] Further advantages, advantageous embodiments and additional 

applications of the invention are provided in the following description of a preferred 
embodiment of the invention in connection with the enclosed figure. 
[0059] Fig. 1 is a block diagram of a cellular phone network with a distributed- 

speech recognition system to generate short messages according to the invention. 

DETAIL DESCRIPTION OF THE INVENTION 
[0060] While the following description is in the context of distributed speech 

recognition systems in cellular phone networks involving portable radio phones, it will 
be understood by those skilled in the art that the present invention may be applied to 
other communication networks, especially the internet, the world wide web or future 
networks. M oreover t he p resent i nvention m ay b e u sed i n a ny s peech r ecognition 
application like local area networks (LAN). 

[0061] Fig. 1 describes the use of a distributed speech recognition system. 

Spoken words 2 are received by a microphone disposed in a first mobile 
communication device 4 and are transformed into coded phonemes in the first mobile 
communication d evice 4. The coded phonemes are t ransferred via a transmission 
facility 7 to an interpreting component 10. The transmission facility 7 uses a first 
digital short message radio channel 6 and a first communication network base 
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station 8. The transmission facility 7 is a cellular phone network. The interpreting 
component 10 receives the coded phonemes and processes them in accordance 
with an orthography database 12, a grammar database 14 and a syntax database 16. 
The interpreting component 10 generates a digitized short message signal from the 
coded phonemes. 

[0062] If the interpretation of the coded phonemes is equivocal, the interpreting 

component 1 0 generates a plurality of possible digitized readable texts. The most 
similar digitized readable text is sent back to the mobile communication device 4, via 
the first network base station 8 and a second digital short message radio channel 18. 
In the first mobile communication d evice 4 the text is d isplayed and the user (not 
shown) accepts or rejects the readable text. If the user rejects the text, a rejection 
command is issued and retransmitted, whereupon the next possible code 
interpretation is sent to the user, until the user accepts a readable text. Next, the user 
dispatches the approved short message via the transmission facility 7 to a receiving 
mobile communication device 24. 

[0063] .» The transmission path extends from the mobile communication device 4 
via the digital short message radio channel 6 to the base station 8. From the base 
station 8, the message is conveyed via a dedicated line 19 to a second base 
station 20. From the second base station 20, the message is sent via a third short 
message radio channel 22 to the receiving mobile communication device 24. Via this 
path a spoken message can be transformed into a short message and is sent to 
another mobile communication device to be read as text. 
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